The 2016 Presidential Race was an attention-grabbing event. It seems we can glean some interesting information by analyzing data regarding financial contributions to relevant campaigns. California is a prominent state so it may be fruitful to regard contributions associated with the state’s population. Granted, the data may not be representative, as California is widely viewed as a strongly “blue” state
We’ll start by loading the required packages and reading in the data. For this task, we’ll use read_csv from the library readr for purposes of convenience. We’ll also read in the data for financial contributions in all of the United States to 2016 presidential campaigns to serve as a reference point.
## cmte_id cand_id cand_nm
## Length:7440252 Length:7440252 Length:7440252
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## contbr_nm contbr_city contbr_st
## Length:7440252 Length:7440252 Length:7440252
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## contbr_zip contbr_employer contbr_occupation
## Length:7440252 Length:7440252 Length:7440252
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## contb_receipt_amt contb_receipt_dt receipt_desc
## Min. : -93308 Length:7440252 Length:7440252
## 1st Qu.: 15 Class :character Class :character
## Median : 28 Mode :character Mode :character
## Mean : 126
## 3rd Qu.: 94
## Max. :12777706
## memo_cd memo_text form_tp
## Length:7440252 Length:7440252 Length:7440252
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## file_num tran_id election_tp
## Min. :1003942 Length:7440252 Length:7440252
## 1st Qu.:1077916 Class :character Class :character
## Median :1098663 Mode :character Mode :character
## Mean :1101464
## 3rd Qu.:1133832
## Max. :1146285
The following are our columns, according to the provider of this data. The datasets used are from the same provider and have the same structure. We’ll also add to these descriptions what we found in the above using the summary() function:
CMTE_ID COMMITTEE ID A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. The contents of this column are strings.
CAND_ID CANDIDATE ID A 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. The contents of this column are strings.
CAND_NM CANDIDATE NAME Reported name of the candidate. The contents of this column are strings.
CONTBR_NM CONTRIBUTOR NAME Reported name of the contributor. The contents of this column are strings.
CONTBR_CITY CONTRIBUTOR CITY Reported city of the contributor. The contents of this column are strings.
CONTBR_ST CONTRIBUTOR STATE Reported state of the contributor. The contents of this column are strings.
CONTBR_ZIP CONTRIBUTOR ZIP CODE Reported zip code of the contributor. The contents of this column are strings but effectively integers. If doesn’t make sense to run summary(as.numeric(all.contribs$contbr_zip)), though, as the values are categorical. Valid values are 5 or 9 digits in length, though we see a number of invalid entries, as well.
CONTBR_EMPLOYER CONTRIBUTOR EMPLOYER Reported employer of the contributor. The contents of this column are strings.
CONTBR_OCCUPATION CONTRIBUTOR OCCUPATION Reported occupation of the contributor. The contents of this column are strings.
CONTB_RECEIPT_AMT CONTRIBUTION RECEIPT AMOUNT Reported contribution amount in US dollars (USD). The contents of this column are numeric quantities (more specifically, positive and negative floats, as some entries represent refunds and the like whereas others represent positive contributions). We see a significant disparity between the median (28) and the mean (126) as well as a large maximum value (12777706), so the data is heavily skewed.
CONTB_RECEIPT_DT CONTRIBUTION RECEIPT DATE Reported contribution receipt date. The date format is DD-MMM-YYYY. The contents of this column are strings that more usefully, may be converted into date objects of some sort. Specifically, we may use the yearmn function to categorize by month and year, the as.Date function, or the ymd function from lubridate.
RECEIPT_DESC RECEIPT DESCRIPTION Additional information reported by the committee about a specific contribution. The contents of this column are strings.
MEMO_CD MEMO CODE ‘X’ indicates the reporting committee has provided additional text to describe a specific contribution. The contents of this column are strings.
MEMO_TEXT MEMO TEXT Additional information reported by the committee about a specific contribution. The contents of this column are strings.
FORM_TP FORM TYPE Indicates what schedule and line number the reporting committee reported a specific transaction.
SA17A: Form 3P Schedule A Line 17A SA18: Form 3P Schedule A Line 18 SB28A: Form 3P Schedule B Line 28A
The contents of this column are strings.
FILE_NUM FILE NUMBER A unique number assigned to a report and all its associated transactions.
The contents of this column are numeric (specifically, integers ranging from 1003942 to 1146285)
TRAN_ID TRANSACTION ID A unique identifier permanently associated with each itemization or transaction appearing in an FEC electronic file. The contents of this column are strings.
ELECTION_TP ELECTION TYPE/PRIMARY GENERAL INDICATOR This code indicates the election for which the contribution was made. EYYYY (election plus election year) P = Primary G = General O = Other C = Convention R = Runoff S = Special E = Recount The contents of this column are strings.
Let’s look at a subset of this data
## # A tibble: 12 x 18
## cmte_id cand_id cand_nm contbr_nm
## <chr> <chr> <chr> <chr>
## 1 C00575795 P00003392 Clinton, Hillary Rodham AULL, ANNE
## 2 C00575795 P00003392 Clinton, Hillary Rodham CARROLL, MARYJEAN
## 3 C00575795 P00003392 Clinton, Hillary Rodham GANDARA, DESIREE
## 4 C00580100 P80001571 Trump, Donald J. RODACY, JON
## 5 C00577130 P60007168 Sanders, Bernard LEE, ALAN
## 6 C00580100 P80001571 Trump, Donald J. ROONEY, JULIANNE
## 7 C00577130 P60007168 Sanders, Bernard LEONELLI, ODETTE
## 8 C00577130 P60007168 Sanders, Bernard LEONELLI, ODETTE
## 9 C00577130 P60007168 Sanders, Bernard LEOPARD, PATTI
## 10 C00575795 P00003392 Clinton, Hillary Rodham HOFER, VIRGINIA
## 11 C00580100 P80001571 Trump, Donald J. ROPPA, RICH
## 12 C00580100 P80001571 Trump, Donald J. SHARP, PATRICIA M. MRS.
## # ... with 14 more variables: contbr_city <chr>, contbr_st <chr>,
## # contbr_zip <int>, contbr_employer <chr>, contbr_occupation <chr>,
## # contb_receipt_amt <dbl>, contb_receipt_dt <chr>, receipt_desc <chr>,
## # memo_cd <chr>, memo_text <chr>, form_tp <chr>, file_num <int>,
## # tran_id <chr>, election_tp <chr>
It looks like null values appears in several forms in this data. In the contbr_employer column, we see both “N/A” if the individual has retired. In the receipt_desc column, we see NA values. Let’s rectify that.
Let’s take a look at the first column, cmte_id. Candidates have corresponding committees so the number of committees should equal the number of candidates (this is tested in test 1) and we should see a commensurate number of observations of each commmittee and of each candidate (this is tested in test 2) and vice versa (this is tested in test 3). Let’s test these.
## [1] "Conducting test 1..."
## [1] "Test 1 passed"
## [1] "Conducting test 2..."
## [1] "Test 2 passed"
## [1] "Conducting test 3..."
## [1] "Test 3 passed"
Terrific. We’d like to consider contributions from California. First, let’s see if contributions in California are somewhat similar to those made in the entire nation. For this purpose, we’ll ignore invalid states such as “20” and “30”, as these do not seem significant. The most obviously incorrect recorded state, “ZZ”, has 8670 associated contributions but the rest have typically less than 10 associated contributions, so it doesn’t seem this decision will render our analysis significantly incorrect.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03409 0.07892 0.11026 0.12003 0.13250 0.27782
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01837 0.09116 0.11091 0.13101 0.17432 0.30181
From the above we see that contributions in California differ significantly from contributions made in all of the United States. Some differences are in accordance with that which may be expected from the reputation of California as a strongly blue state. For instance, more contributions were made for Clinton and Sanders than almost any other Republican candidate. What is strange is that a large quantity and amount of contributions were made in California, a stereotypically Democratic state, to Fiorina, a Republican candidate, relative to Democratic and Independent party candidates. We also see that those responsible for these contributions tended to make larger contributions.
Californians also seem responsible for a disproportionate number of contributions and amount contributed, especially to Lawrence Lessig and Jill Stein. Californians contributed 30% of contributions to Lessig and Stein but also regularly were responsible for close to 10% (sometimes less, but often more) of contributions to candidates, even though California is just one of fifty states (more than fifty if you also include the American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and/or the U.S. Virgin Islands).
This graph further shows that relative to much of the rest of the nation, California seems relatively politically active and strongly blue. Jindal and Lessig both withdrew their respective campaigns quite early (in 2015) and should be relatively unknown in California, but contributions from California already make up nearly a third of all contributions to Lessig and only a relatively small fraction of contributions to Jindal. The significant difference suggests strong Democratic preferences of Californians.
Now, let’s consider the number of contributions (associated with the state of California) made for each candidate.
##
## Bush, Jeb Carson, Benjamin S.
## 3130 27370
## Christie, Christopher J. Clinton, Hillary Rodham
## 333 688524
## Cruz, Rafael Edward 'Ted' Fiorina, Carly
## 57822 4706
## Gilmore, James S III Graham, Lindsey O.
## 3 347
## Huckabee, Mike Jindal, Bobby
## 531 31
## Johnson, Gary Kasich, John R.
## 1758 3005
## Lessig, Lawrence McMullin, Evan
## 372 306
## O'Malley, Martin Joseph Pataki, George E.
## 397 20
## Paul, Rand Perry, James R. (Rick)
## 4279 116
## Rubio, Marco Sanders, Bernard
## 14095 407164
## Santorum, Richard J. Stein, Jill
## 91 2842
## Trump, Donald J. Walker, Scott
## 86258 740
## Webb, James Henry Jr.
## 106
As one might expect, knowing the stereotypical political leaning of Californians, Clinton received the most contributions. She even received 69% more contributions than Sanders, who was often said during the time to have received an inordinately large number of contributions. The most popular Republican, Donald Trump, received about 1/8th as many contributions as Clinton.
We also see that the majority of Republican candidates received far fewer contributions. As an extreme example, James S. Gilmore only received a measly 3 contributions.
It’s curious how much each candidate received in contributions. One might be interested how these figures compare to the number of contributions received.
## [1] "Number of Contributions"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 306 740 52174 4706 688524
## [1] "Amount Contributed (USD"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8100 186144 495231 6064153 2912555 93681171
Again, we see a much larger mean than median in both the cases of frequency and contribution amount. This argues that frequencies of contributions for candidates are heavily skewed. In light of this, it should be expected that contribution amounts will also be heavily skewed.
After viewing the plot, it seems like one can reliably say that candidates that received more contributions had a larger sum of money contributed to them, rather than that some candidates had overwhelming numbers of small or large contributions. Particularly interesting is James S. Gilmore, who received far fewer contributions than any other candidate (likely 3) but had a comparable sum of money (log10 value) contributed to him.
## Candidate Variable Value
## 7 Gilmore Freq 3
## 32 Gilmore Contribution.Amount 8100
It seems our above plot was excessively crowded, such that the x-axis is not very intelligible. Let’s only consider a subset of candidates: “Bush, Jeb”, “Clinton, Hillary Rodham”, “Cruz, Rafael Edward ‘Ted’”, “Rubio, Marco”, “Sanders, Bernard”, and “Trump, Donald J.” and disregard other candidates, including “Gilmore, James S III”.
## [1] "Number of Contributions"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3130 25027 72040 209499 326938 688524
## [1] "Amount Contributed (USD"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3300292 5061916 9868532 23530225 18219470 93681171
Again, we see a wide range of contribution frequencies and amounts, though not as large as previously (when we considered all candidates). The disparities between mean and median values of contribution frequency and amount also decreased but still seem large.
Frankly, nothing in particular seems to stand out from the above graph. One might argue that the contributions to some candidates were, on average, larger than the contributions to another candidate, such as contributions to Sanders relative to those to Trump, but the differences don’t seem too noteworthy, especially since we scaled the values quite substantially. It seems more apt to look at median contribution amounts (dollar amounts are heavily skewed, as extreme values are present and render figures deceptie; we’ll also calculate mean amounts for reference) and at proportions for that purpose.
## [1] "Bush, Jeb average contribution amount: $500"
## [1] "Carson, Benjamin S. average contribution amount: $50"
## [1] "Christie, Christopher J. average contribution amount: $1000"
## [1] "Clinton, Hillary Rodham average contribution amount: $25"
## [1] "Cruz, Rafael Edward 'Ted' average contribution amount: $50"
## [1] "Fiorina, Carly average contribution amount: $100"
## [1] "Gilmore, James S III average contribution amount: $2700"
## [1] "Graham, Lindsey O. average contribution amount: $1000"
## [1] "Huckabee, Mike average contribution amount: $50"
## [1] "Jindal, Bobby average contribution amount: $250"
## [1] "Johnson, Gary average contribution amount: $100"
## [1] "Kasich, John R. average contribution amount: $100"
## [1] "Lessig, Lawrence average contribution amount: $250"
## [1] "McMullin, Evan average contribution amount: $100"
## [1] "O'Malley, Martin Joseph average contribution amount: $250"
## [1] "Pataki, George E. average contribution amount: $1000"
## [1] "Paul, Rand average contribution amount: $50"
## [1] "Perry, James R. (Rick) average contribution amount: $2700"
## [1] "Rubio, Marco average contribution amount: $75"
## [1] "Sanders, Bernard average contribution amount: $27"
## [1] "Santorum, Richard J. average contribution amount: $53.8"
## [1] "Stein, Jill average contribution amount: $100"
## [1] "Trump, Donald J. average contribution amount: $80"
## [1] "Walker, Scott average contribution amount: $250"
## [1] "Webb, James Henry Jr. average contribution amount: $275"
## [1] "Bush, Jeb average contribution amount: $1054.40633546326"
## [1] "Carson, Benjamin S. average contribution amount: $106.414139568871"
## [1] "Christie, Christopher J. average contribution amount: $1369.56756756757"
## [1] "Clinton, Hillary Rodham average contribution amount: $136.06086463217"
## [1] "Cruz, Rafael Edward 'Ted' average contribution amount: $99.1090289163294"
## [1] "Fiorina, Carly average contribution amount: $308.263795155121"
## [1] "Gilmore, James S III average contribution amount: $2700"
## [1] "Graham, Lindsey O. average contribution amount: $1194.51008645533"
## [1] "Huckabee, Mike average contribution amount: $434.822222222222"
## [1] "Jindal, Bobby average contribution amount: $749.395483870968"
## [1] "Johnson, Gary average contribution amount: $281.701114903299"
## [1] "Kasich, John R. average contribution amount: $505.68037936772"
## [1] "Lessig, Lawrence average contribution amount: $500.388440860215"
## [1] "McMullin, Evan average contribution amount: $164.495098039216"
## [1] "O'Malley, Martin Joseph average contribution amount: $751.471687657431"
## [1] "Pataki, George E. average contribution amount: $1522.5"
## [1] "Paul, Rand average contribution amount: $180.167863986913"
## [1] "Perry, James R. (Rick) average contribution amount: $1796.55172413793"
## [1] "Rubio, Marco average contribution amount: $343.31278609436"
## [1] "Sanders, Bernard average contribution amount: $48.1963845281017"
## [1] "Santorum, Richard J. average contribution amount: $400.877802197802"
## [1] "Stein, Jill average contribution amount: $264.52660450387"
## [1] "Trump, Donald J. average contribution amount: $162.377765424656"
## [1] "Walker, Scott average contribution amount: $678.651216216216"
## [1] "Webb, James Henry Jr. average contribution amount: $722.341132075472"
As previously stated, California is notoriously a strongly “blue” state. Differences between contributions to candidates in the same party seem to be of greater import. Lets start by looking at Republican candidates.
It seems the most obscure and the most well-established candidates received the largest contributions. James S. Gilmore, the least “successful” candidate in the election, had, by far, the largest median contribution amount. It seems somewhat surprising that Carly Fiorina and Marco Rubio received relatively diminutive contributions. It seems rather trivial, however, that the most obscure and most well-established candidates will receive larger contributions. Such is to be expected, but one might be glad to have this numerical support of the principle.
The most prominent independent candidates, Gary Johnson (Libertarian Party) and Jill Stein (Green Party), also received relatively small average contributions. The average size of contributions to Trump was smaller than that to every other Republican candidate, save for Ted Cruz.
On the Democratic side, Clinton and Sanders both tended to receive much smaller contributions. If we look at median amounts, that received by both candidates is similar, though the average amount Sanders enjoyed was just over 1/3 (about 35%, to be exact) that enjoyed by Clinton. Both candidates received relatively large contributions that dragged up their average contribution sizes, but Hillary seems to have received either far larger or far more relatively large contributions, as her average contribution size was dragged up much more. This isn’t too surprising if one considers that her campaign was, in a sense, about twice as long as that of Sanders, given that she won the primary election and Sanders lost. At least as far as California goes, it seems the contributions that the two received are very similar. Let’s look at more specific graphs to confirm this hypothesis. We’ll look at density of positive contribution amounts, as negative amounts (e.g., because of refunds, errors, etc.) may render some statistics, such as a portion of those in five number summaries, deceptive.
## contb_receipt_amt
## Min. :-5400.0
## 1st Qu.: 15.0
## Median : 25.0
## Mean : 136.1
## 3rd Qu.: 100.0
## Max. :10000.0
## contb_receipt_amt
## Min. :-10500.0
## 1st Qu.: 15.0
## Median : 27.0
## Mean : 48.2
## 3rd Qu.: 50.0
## Max. : 10000.0
Indeed, it seems the two received similar contributions in terms of size. We see peaks at similar points (e.g., for 50, 100, 200, 250, etc.). To be fair, whereas both candidates saw many of the same peaks, it seems that the higher peaks (e.g., 50, 150, 200, 250, 500, etc.) provided for a greater proportion of contributions to Clinton than contributions to Sanders. Contributions of 10 and 50 comprised a greater proportion of contributions to Sanders than contributions to Clinton so the trend is not completely consistent, but these might be compensated for by the fact that a greater portion of contributions to Clinton were for 5, 50, 200, and so on. it seems that for the most part, financial contributions to Sanders were smaller in size. If we view the same plot for contributions across the entire nation and use much larger bins, we may draw the same conclusion - both enjoyed somewhat similar contribution sizes, but larger contributions comprise a larger proportion of contributions to Clinton than to Sanders.
## contb_receipt_amt
## Min. : -20000
## 1st Qu.: 15
## Median : 25
## Mean : 147
## 3rd Qu.: 100
## Max. :12777706
## contb_receipt_amt
## Min. :-93308.00
## 1st Qu.: 13.50
## Median : 27.00
## Mean : 44.71
## 3rd Qu.: 50.00
## Max. : 10000.00
Another Democratic candidate, Jim Webb, enjoyed substantially larger average and median contribution amounts than both Clinton and Sanders but he did not seem significant in the race.
It seems that contributers to Republicans are more affluent and/or generous, though. It doesn’t seem surprising that the most seemingly significant Democratic candidates, Clinton and Sanders, received substantially more contributions than any other candidate in the strongly “blue” state of California. It is surprising, however, that the most prominent Democratic candidates received relatively small contributions on average, whereas Jim Webb received a much larger average contribution amount.
Let’s compare this to what financial contributions from all of the United States in order to see if any of these attributes are mirrored in the rest of the country
## [1] "Rubio, Marco Median Contribution Amount: $80"
## [1] "Santorum, Richard J. Median Contribution Amount: $200"
## [1] "Perry, James R. (Rick) Median Contribution Amount: $1000"
## [1] "Carson, Benjamin S. Median Contribution Amount: $50"
## [1] "Cruz, Rafael Edward 'Ted' Median Contribution Amount: $50"
## [1] "Paul, Rand Median Contribution Amount: $50"
## [1] "Clinton, Hillary Rodham Median Contribution Amount: $25"
## [1] "Sanders, Bernard Median Contribution Amount: $27"
## [1] "Fiorina, Carly Median Contribution Amount: $100"
## [1] "Huckabee, Mike Median Contribution Amount: $62.5"
## [1] "Pataki, George E. Median Contribution Amount: $1000"
## [1] "O'Malley, Martin Joseph Median Contribution Amount: $250"
## [1] "Graham, Lindsey O. Median Contribution Amount: $500"
## [1] "Bush, Jeb Median Contribution Amount: $500"
## [1] "Trump, Donald J. Median Contribution Amount: $64"
## [1] "Jindal, Bobby Median Contribution Amount: $2700"
## [1] "Christie, Christopher J. Median Contribution Amount: $1000"
## [1] "Walker, Scott Median Contribution Amount: $250"
## [1] "Stein, Jill Median Contribution Amount: $100"
## [1] "Webb, James Henry Jr. Median Contribution Amount: $250"
## [1] "Kasich, John R. Median Contribution Amount: $250"
## [1] "Gilmore, James S III Median Contribution Amount: $750"
## [1] "Lessig, Lawrence Median Contribution Amount: $250"
## [1] "Johnson, Gary Median Contribution Amount: $100"
## [1] "McMullin, Evan Median Contribution Amount: $100"
## [1] "Rubio, Marco Average Contribution Amount: $298.643873432938"
## [1] "Santorum, Richard J. Average Contribution Amount: $614.813691236216"
## [1] "Perry, James R. (Rick) Average Contribution Amount: $1233.87950440529"
## [1] "Carson, Benjamin S. Average Contribution Amount: $105.793331547334"
## [1] "Cruz, Rafael Edward 'Ted' Average Contribution Amount: $100.984092266415"
## [1] "Paul, Rand Average Contribution Amount: $181.977299061105"
## [1] "Clinton, Hillary Rodham Average Contribution Amount: $147.01968845272"
## [1] "Sanders, Bernard Average Contribution Amount: $44.706139027879"
## [1] "Fiorina, Carly Average Contribution Amount: $226.324144208973"
## [1] "Huckabee, Mike Average Contribution Amount: $364.101526315789"
## [1] "Pataki, George E. Average Contribution Amount: $1467.00421511628"
## [1] "O'Malley, Martin Joseph Average Contribution Amount: $747.775392138424"
## [1] "Graham, Lindsey O. Average Contribution Amount: $834.992954287014"
## [1] "Bush, Jeb Average Contribution Amount: $1070.51119216538"
## [1] "Trump, Donald J. Average Contribution Amount: $158.841822780055"
## [1] "Jindal, Bobby Average Contribution Amount: $1653.06295424837"
## [1] "Christie, Christopher J. Average Contribution Amount: $1310.35852450817"
## [1] "Walker, Scott Average Contribution Amount: $684.645453802126"
## [1] "Stein, Jill Average Contribution Amount: $226.037764065336"
## [1] "Webb, James Henry Jr. Average Contribution Amount: $549.0580125"
## [1] "Kasich, John R. Average Contribution Amount: $557.681818966193"
## [1] "Gilmore, James S III Average Contribution Amount: $1164.88306818182"
## [1] "Lessig, Lawrence Average Contribution Amount: $464.148244958925"
## [1] "Johnson, Gary Average Contribution Amount: $264.938420582322"
## [1] "McMullin, Evan Average Contribution Amount: $214.026457364341"
It seems that indeed, more established candidates and relatively obscure candidates tended to have higher average contribution amounts, with the notable exceptions of Marco Rubio and Ted Cruz. Here we see that indeed, the average financial contribution amount to Bernie Sanders is about 1/3 of that to Hillary Clinton. Also, nationwide averages of financial contributions to relatively obscure individuals such as to James Gilmore and George Pataki are not as extreme as in California, perhaps because we now have a larger “pool” of contributions from which to make calculations. Again, we see that the most prominent Independent candidates (Jill Stein and Gary Johnson; McMullin did not feature very prominently) enjoyed relatively low but still respectable financial contribution amounts, with greater average and median amounts than the two most prominent Democrats.
Let’s look at financial contributions at a finer level. We’ll examine the individuals that made the most contributions.
## [1] "PETIT, MICHAEL"
##
## Clinton, Hillary Rodham
## 473
## [1] "MITCHELL, MARCIA"
##
## Clinton, Hillary Rodham
## 409
## [1] "SMITH, CHERYL"
##
## Clinton, Hillary Rodham
## 375
## [1] "SAMATUA, DENISE"
##
## Clinton, Hillary Rodham
## 373
## [1] "CARROLL, TERI"
##
## Clinton, Hillary Rodham
## 352
## [1] "ADAM, MONIQUE"
##
## Clinton, Hillary Rodham
## 321
## [1] "MONTANELLI, TERESA"
##
## Clinton, Hillary Rodham
## 317
## [1] "WHITE, DEBORAH"
##
## Clinton, Hillary Rodham Sanders, Bernard
## 272 15
## [1] "REYNOLDS, MARK"
##
## Clinton, Hillary Rodham Cruz, Rafael Edward 'Ted'
## 280 2
## [1] "MEISTER, JACOB"
##
## Clinton, Hillary Rodham
## 279
## [1] "WEIL, MONIQUE"
##
## Sanders, Bernard
## 257
## [1] "NGUYEN, DANH"
##
## Clinton, Hillary Rodham
## 251
## [1] "PENDERGAST, JAN"
##
## Cruz, Rafael Edward 'Ted'
## 245
## [1] "RAFAEL, EMERITA"
##
## Clinton, Hillary Rodham
## 242
## [1] "MCLENNAN, MARLYN"
##
## Sanders, Bernard
## 240
It looks like those responsible for the most financial contributions (by quantity, not quality) contributed almost exclusively to Clinton. A few individuals contributed relatively infrequently to two other candidates, Bernie Sanders and Ted Cruz. The fact that one contributer is associated with contributions to both a Democratic and a Republican candidate is interesting. The number is so large that the person had to have been making a contribution every 2 days (the primary season is about 1.5 years long). Let’s examine this individual, Mark Reynolds, in more detail.
## contbr_employer
## cand_nm BEN-E-LECT THE NATURE CONSERVANCY VISA, INC.
## Clinton, Hillary Rodham 0 2 268
## Cruz, Rafael Edward 'Ted' 2 0 0
It seems the Mark Reynolds that contributed to Clinton’s campaign is different from the Mark Reynolds that contributed to Cruz’s campaign. Further, it seems like multiple individuals with the name Mark Reynolds contributed to Clinton’s campaign. It seems quite probable that the same individual switched jobs multiple times and that multiple individuals share the same name. Without a unique identifier of contributer, it is difficult to determine how many individuals we’re dealing with here. It does seem curious that one individual would make multiple financial contributions. Perhaps this is indicative of a dishonest practice. To be fair, though, this could be common practice. Affluent contributers might choose to make a large contribution at one time, perhaps at a fundraising dinner or other event, whereas less affluent contributers may merely choose to allocate a small fraction of every paycheck received on a monthly basis. Again, the details of this tendency aren’t illuminated exactly in this dataset, as we aren’t given a unique identifier of contributers, but the possible existence of this practice at least suggests further investigation.
I’d like to take a closer look. It seems we have a good chance of isolating a single individual if we subset also by the employer. To be fair, it seems isolation is more likely if we select a smaller company. Visa, Inc. is not a small company so we may be better off choosing another individual. Let’s try using “MEISTER, JACOB,” for the name seems less common. First, we’ll check to see if the employer associated with the most contributions is a smaller employer, as such an employer is less likely to employ multiple individuals with the same name.
## contbr_employer
## cand_nm NOLAN BARTON BRADFORD OLMOS LLP
## Clinton, Hillary Rodham 279
Terrific - it seems like “NOLAN BARTON BRADFORD OLMOS LLP” is a much smaller employer and we’re thus less likely to be looking at multiple individuals. Let’s look at the days on which he made contributions.
## contb_receipt_dt
## 2001-03-16 2001-11-16 2002-02-16 2002-03-16 2002-10-16 2002-11-16
## 2 1 2 1 1 1
## 2003-03-16 2003-06-16 2003-11-16 2004-02-16 2004-04-16 2004-05-16
## 2 1 2 1 1 1
## 2004-06-16 2004-11-16 2005-02-16 2005-03-16 2005-04-16 2005-06-16
## 3 3 1 1 2 2
## 2005-11-16 2006-03-16 2006-04-16 2006-06-16 2006-07-16 2006-08-16
## 3 1 2 2 1 3
## 2006-09-16 2006-11-16 2007-02-16 2007-03-16 2007-04-16 2007-07-16
## 2 2 1 2 1 1
## 2007-11-16 2008-02-16 2008-03-16 2008-04-16 2008-11-16 2009-02-16
## 4 2 2 2 1 4
## 2009-04-16 2009-05-16 2009-09-16 2010-03-16 2010-04-16 2010-05-16
## 1 1 1 3 1 1
## 2011-02-16 2011-03-16 2011-05-16 2011-06-16 2011-08-16 2012-03-16
## 2 1 2 1 2 2
## 2012-04-16 2012-05-16 2012-07-16 2013-02-16 2013-03-16 2013-05-16
## 2 1 1 1 2 2
## 2013-07-16 2013-08-16 2014-03-16 2014-04-16 2014-07-16 2015-02-16
## 1 1 3 3 2 1
## 2015-03-16 2015-04-16 2015-05-16 2015-06-16 2015-07-16 2015-09-16
## 2 1 2 1 1 3
## 2016-03-16 2016-04-16 2016-05-16 2016-06-16 2016-07-16 2016-09-16
## 2 2 2 2 1 2
## 2017-02-16 2017-03-16 2017-04-16 2017-05-16 2017-06-16 2017-08-16
## 2 2 1 3 1 1
## 2017-10-16 2018-02-16 2018-03-16 2018-04-16 2018-07-16 2018-09-16
## 1 3 1 3 2 1
## 2018-10-16 2019-02-16 2019-04-16 2019-05-16 2019-06-16 2019-08-16
## 1 2 1 4 2 1
## 2019-10-16 2020-02-16 2020-04-16 2020-05-16 2020-07-16 2020-08-16
## 1 3 1 1 1 1
## 2021-02-16 2021-05-16 2021-09-16 2021-10-16 2022-02-16 2022-04-16
## 1 3 1 1 1 1
## 2022-06-16 2022-07-16 2023-02-16 2023-03-16 2023-04-16 2023-05-16
## 1 2 3 1 1 2
## 2024-02-16 2024-03-16 2024-04-16 2024-05-16 2024-07-16 2024-09-16
## 2 1 3 2 1 1
## 2025-02-16 2025-03-16 2025-04-16 2025-05-16 2025-07-16 2026-02-16
## 1 1 1 3 1 1
## 2026-03-16 2026-04-16 2026-05-16 2026-09-16 2026-10-16 2027-02-16
## 1 2 4 1 1 3
## 2027-03-16 2027-04-16 2027-05-16 2027-06-16 2027-07-16 2027-08-16
## 1 1 5 1 3 1
## 2027-10-16 2028-02-16 2028-03-16 2028-05-16 2028-07-16 2028-08-16
## 1 2 1 6 2 1
## 2028-10-16 2029-02-16 2029-03-16 2029-04-16 2029-05-16 2029-06-16
## 1 5 2 2 4 2
## 2029-07-16 2029-09-16 2030-03-16 2030-04-16 2030-05-16 2030-06-16
## 1 1 3 2 2 3
## 2030-09-16 2031-03-16 2031-05-16 2031-07-16 2031-08-16 2031-10-16
## 1 5 4 3 1 1
The above seems worthy of attention. We see dates as early as in 2001, which may not be surprising since many expected her to run (2016 is oddly specific, though). Dates 2017 and beyond seem strange, though. We’ll assume that the FEC may recognize commitments to payments at future dates and will issue receipts for such payments.
It’s difficult to imagine future payments being made, given that if Clinton had won the 2016 election and had been reelected in 2020, she would not have been able to campaign in any election after that. For all we know, she might be out of public office and in another country or met an unfortunate circumstance after 2024. However, it seems Meister has made payments for Clinton’s presidential campaign as far as in 2031. Further, it was commonly assumed that Clinton would win the 2016 election and if that were true, winning the 2020 election would be somewhat likely assuming that she’d be as great of a president as her ardent supporters believed, but Meister is most active in the sphere of financial contributions during years beyond 2024 (i.e., by what seems to be logical reasoning of a passionate supporter, the point at which Clinton would be out of the presidential office and no longer able to run for the position).
If we look at the tabular results, we see that receipts are for payments on the 16th of each month. That would render future payments more credible, as one may subscriber to a recurring monthly payment (perhaps all candidates have a similar system in place, as the contribution receipt dates for all candidates are typically on the 15th or 16th of each month, as the below cell shows). Less credible, however, is the fact that there are often receipts for four or more payments each day (e.g., 6 on May 16, 2028 (almost 12 years after her 2016 campaign!), 5 exactly 9 months from then, 4 another 2 months from then, etc.). The data we are presently analyzing isn’t sufficient for us to derive any definite conclusions regarding integrity in this arena, but it certainly does raise suspicion.
## [1] "2001-01-15" "2001-01-16" "2001-02-15" "2001-02-16" "2001-03-15"
## [6] "2001-03-16" "2001-04-15" "2001-04-16" "2001-05-15" "2001-05-16"
## [11] "2001-06-15" "2001-06-16" "2001-07-15" "2001-07-16" "2001-08-15"
## [16] "2001-08-16" "2001-09-15" "2001-09-16" "2001-10-15" "2001-10-16"
## [21] "2001-11-15" "2001-11-16" "2001-12-14" "2001-12-15" "2001-12-16"
## [26] "2002-01-16" "2002-02-15" "2002-02-16" "2002-03-16" "2002-04-15"
## [31] "2002-04-16" "2002-05-15" "2002-05-16" "2002-06-15" "2002-06-16"
## [36] "2002-07-15" "2002-07-16" "2002-08-15" "2002-08-16" "2002-09-15"
## [41] "2002-09-16" "2002-10-15" "2002-10-16" "2002-11-15" "2002-11-16"
## [46] "2002-12-15" "2002-12-16" "2003-01-16" "2003-02-15" "2003-02-16"
Next, we have a frequency plot that allows one to visualize some of this information more easily. We’ll use a histogram rather than a line plot because we don’t want to imply that intermediate numbers of contribution receipts occurred on dates which no contribution receipts contained. The code for a line graph is specified, though, in case my message isn’t clear.
## Min. 1st Qu. Median Mean 3rd Qu.
## "2001-03-16" "2010-03-31" "2018-04-16" "2018-03-30" "2027-02-16"
## Max.
## "2031-10-16"
For some perspective, let’s take a look at the financial contributions to all candidates from all financial contributors.
## cand_nm: Bush, Jeb
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2013 2020 2019 2028 2032
## --------------------------------------------------------
## cand_nm: Carson, Benjamin S.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2010 2017 2017 2024 2032
## --------------------------------------------------------
## cand_nm: Christie, Christopher J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2009 2017 2017 2026 2032
## --------------------------------------------------------
## cand_nm: Clinton, Hillary Rodham
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2008 2017 2017 2026 2032
## --------------------------------------------------------
## cand_nm: Cruz, Rafael Edward 'Ted'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2006 2016 2016 2025 2032
## --------------------------------------------------------
## cand_nm: Fiorina, Carly
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2009 2018 2017 2025 2032
## --------------------------------------------------------
## cand_nm: Gilmore, James S III
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2004 2008 2013 2013 2018 2023
## --------------------------------------------------------
## cand_nm: Graham, Lindsey O.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2008 2017 2017 2025 2032
## --------------------------------------------------------
## cand_nm: Huckabee, Mike
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2011 2018 2018 2026 2032
## --------------------------------------------------------
## cand_nm: Jindal, Bobby
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2003 2013 2024 2020 2025 2032
## --------------------------------------------------------
## cand_nm: Johnson, Gary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2010 2017 2017 2026 2032
## --------------------------------------------------------
## cand_nm: Kasich, John R.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2010 2016 2017 2024 2032
## --------------------------------------------------------
## cand_nm: Lessig, Lawrence
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2002 2011 2017 2017 2024 2032
## --------------------------------------------------------
## cand_nm: McMullin, Evan
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2002 2007 2015 2016 2026 2032
## --------------------------------------------------------
## cand_nm: O'Malley, Martin Joseph
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2011 2019 2018 2026 2032
## --------------------------------------------------------
## cand_nm: Pataki, George E.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2011 2013 2017 2019 2028 2029
## --------------------------------------------------------
## cand_nm: Paul, Rand
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2011 2019 2018 2027 2032
## --------------------------------------------------------
## cand_nm: Perry, James R. (Rick)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2005 2025 2025 2025 2029 2030
## --------------------------------------------------------
## cand_nm: Rubio, Marco
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2010 2016 2017 2026 2032
## --------------------------------------------------------
## cand_nm: Sanders, Bernard
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2009 2017 2017 2026 2032
## --------------------------------------------------------
## cand_nm: Santorum, Richard J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2006 2020 2017 2030 2032
## --------------------------------------------------------
## cand_nm: Stein, Jill
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2012 2022 2019 2026 2032
## --------------------------------------------------------
## cand_nm: Trump, Donald J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2009 2013 2015 2022 2032
## --------------------------------------------------------
## cand_nm: Walker, Scott
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2002 2010 2017 2017 2025 2032
## --------------------------------------------------------
## cand_nm: Webb, James Henry Jr.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2001 2005 2014 2014 2024 2030
From this plot, we see that it’s actually quite common for contribution receipts to be dated some time into the future. With the exceptions of James Gilmore III, Bobby Jindal, Marco Rubio, Donald Trump, and Jim Webb, it seems candidates typically have as many or more future contributions than current or past contributions. It’s quite possible that these represent pledged contributions, which would partially explain Jacob Meister’s having receipts for payments on future dates. If he already is subscribed to monthly payments and campaign employees solicit financial contributions from him on many an occasion and/or if he often decides he’d like to contribute more to Clinton in terms of finances, Meister may often add another committed payment to his subscription and, for the sake of convenience, use a date he already has a commitment on.
It seems we should take a look more closely at financial contributions in more detail to another candidate. Above, we found that “PENDERGAST, JAN” also contributed many times (245, to be exact) to another candidate - Ted Cruz. Let’s take a look at her financial contributions.
## contb_receipt_amt
## -500 -100 -50 -35 -25 -15 -10 -5 5 10 15 20 25 30 35
## 2 1 10 4 17 1 2 1 3 7 1 2 52 1 13
## 45 50 100 500
## 1 25 3 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2017 2021 2027 2025 2029 2032
Pendergast also was recorded as making multiple contributions on the same day (as many as 24!). A notable difference with Pendergast is that she often has identical numbers of transactions on similar dates, as the numerous pairs of identical bars in the above plot show. From a naive standpoint, it may seem strange for people to have many contribution receipts dated in the future with many receipts sharing dates, though it may perhaps not be uncommon.
It is trivial to assert that all supporters made at least one contribution, but one might also like to know how many contributions supporters made each day, on average, as the practice by Meister and Pendergast of making multiple contributions each day seems noteworthy. Thus, we’ll calculate the average daily number of contributions made by each of the recorded supporters. Contributers will be categorized by their average daily number of contributions. We’ll use categories of [0,1) (less than one contribution per day for which contribution receipts were associated), [1, 2) and [2, \(\infty\)), then divide by the total number of records so as to obtain proportions. In this way, we’ll be able to assess the proportion of users that made one or fewer than one average contribution per day, more than one or fewer than two contributions per day, on average (this should cover erroneous contributions), and more than two contributions per day. Then, we’ll look at the raw numerical results and plot them.
## Bucket Proportion
## 1 (0,1] 0.465848625
## 2 (1,2] 0.526990561
## 3 (2,Inf] 0.007160814
The faceted plot of the number of contributions received over time by all candidates shows that almost every candidate has contributions with receipts dated far after their campaigns, whereas the plot immediately above shows that supporters typically contribute an average of twice or less than twice a day on each day that they make a contribution. We actually see that an average daily number of contributions of more than one is more common than one (it is actually impossible for a recorded contributer to make less than an average of less than one contribution per day on each day that the entity makes a contribution. The fact that such a small percentage (approximately 0.7%) of contributors make more than an average of two contributions per day, tells us that those that have contributed more than two times on one day tend to make between one and two, inclusive, contributions per day - activity that seems more credulous - and/or are few and far between.
Thus, at this point, this contribution activity of making more than one contribution per day piques curiosity, but doesn’t arouse suspicion. It seems especially strange that Pendergast often has the same number of receipts on two dates (i.e., pairs of identical bars in the plot above), though I may be making much about nothing.
Let’s move on then. We’ll finally take a look at the number of contributions over time and space. We already looked at variations in the quantity of contributions over time above, but it also seems likely that people in the same locale will receive similar inspiration from spatially specific events (e.g., a candidate promising to provide funding or amenties for a specific purpose near and dear to the hearts of the population if said candidate is elected) and entire populations of different locales will display shifting activity with respect to financial contributions over time.
## [1] "LANCASTER" "NAPA"
## [3] "TIBURON" "SAN LUIS OBISPO"
## [5] "BURLINGAME" "MURRIETA"
## [7] "PALO ALTO" "SUNNYVALE"
## [9] "DANA POINT" "SAN JUAN CAPISTRANO"
## [11] "YUCCA VALLEY" "NEWPORT BEACH"
## [13] "PALOS VERDES ESTATES" "COSTA MESA"
## [15] "SHINGLETOWN" "LOMITA"
## [17] "ELK GROVE" "VAN NUYS"
## [19] "SAN JOSE" "FRESNO"
## [21] "LOS ANGELES" "SAN DIEGO"
## [23] "GLENDORA" "CARLSBAD"
## [25] "CHINO HILLS" "IRVINE"
## [27] "VENICE" "RAMONA"
## [29] "MORGAN HILL" "PITTSBURG"
## [31] "BAKERSFIELD" "LA MESA"
## [33] "SAN MATEO" "LAGUNA HILLS"
## [35] "ATASCADERO" "BEVERLY HILLS"
## [37] "SANTA BARBARA" "SANTA MARGARITA"
## [39] "PASADENA" "IMPERIAL"
## [41] "RANCHO CUCAMONGA" "PALOS VERDES PENINSULA"
## [43] "ALPINE" "LAKE ARROWHEAD"
## [45] "WATSONVILLE" "MISSION VIEJO"
## [47] "SANTA ANA" "MARTINEZ"
## [49] "TORRANCE" "SAN LEANDRO"
## [51] "TEMPLETON" "ARROYO GRANDE"
## [53] "EUREKA" "SANTA ROSA"
## [55] "ANTIOCH" "CONCORD"
## [57] "RANCHO SANTA FE" "HERMOSA BEACH"
## [59] "LOOMIS" "GRANITE BAY"
## [61] "SAN FRANCISCO" "WATERFORD"
## [63] "SACRAMENTO" "LIVERMORE"
## [65] "FREMONT" "ENCINO"
## [67] "LOS GATOS" "HOOPA"
## [69] "BREA" "OAKLAND"
## [71] "SANTA CRUZ" "SAUSALITO"
## [73] "HUNTINGTON BEACH" "ONTARIO"
## [75] "PACIFIC PALISADES" "HACIENDA HEIGHTS"
## [77] "RCH PALOS VRD" "SOUTH SAN FRANCISCO"
## [79] "SYLMAR" "WEST HOLLYWOOD"
## [81] "PERRIS" "PORTOLA VALLY"
## [83] "CANOGA PARK" "ROCKLIN"
## [85] "SALINAS" "PALM SPRINGS"
## [87] "THOUSAND OAKS" "SAN CARLOS"
## [89] "MONTEBELLO" "VALLEJO"
## [91] "BERKELEY" "CAMBRIA"
## [93] "MOUNTAIN VIEW" "BURBANK"
## [95] "CARMICHAEL" "CAYUCOS"
## [97] "ANAHEIM" "MANTECA"
## [99] "ENCINITAS" "FILLMORE"
## [1] 1304346
## [1] 839
## [1] 13
In the above, we look at the density of contributions over space (i.e., where in California we see significant proportions of associated contributions). If we look at the above graph, we see that the number of contributions made in the major metropolitan areas around the San Francisco Bay Area, Sacremento, Los Angeles, and San Diego so greatly dwarfs the number of contributions made in other parts of the state as to render these arguably meaningless in quantity. To be sure, contributions were made in less prominent parts of California, as can be seen by a simple glance at the cities associated with donations, as below.
In the above plot, we plot points for every instance rather than only indicating the areas with greater proportions of contributions made. We also use transparency to indicate contribution size. We set the alpha level so as to add the size of financial contributions to the above visualization. We see that even when the max alpha is divided by 3.5, there were so many contributions associated with the aforementioned major metropolitan areas as to create dark areas. To be fair, contributions made in less prominent regions of the state are still large and/or numerous enough to render areas of the map at least slightly orange. In particular, Fresno and Bakersfield are colored in a decently dark shade of orange.
Another aspect one may be interested is the candidates to which contributions were made. We’ll examine that next. For this investigation, we’ll also subset the data to contain only prominent candidates. Such a change doesn’t result in a greatly reduced quantity of data.
In the above, we see a large number of contributions in all corners of the state made for both Sanders and Clinton. We used a low alpha value, but sky blue (e.g., that corrresponding to Sanders) is still quite prominent. Frankly, the major metropolitan areas appear rather like something in between brown and gray (the hue will be henceforth referred to as “brown”), which is the result of amalgamating a significant amount (the alpha level was low but many contributions were recorded) all of these disparate colors. On the one hand, the color blue, as well as a smattering of purple (e.g., representing Stein and arguably Trump, to some extent) quite stands out in the above plot, which is testament to how strongly blue the state is, but many regions are more brown than purple (the combination of blue and red, as well as the color corresponding to Jill Stein in the above plot), even with the low alpha value used, which argues that the population of California is perhaps more politically diverse than some may believe.
## [1] "List of locales with which contributions are associated:"
## [1] "LANCASTER" "LOS ANGELES"
## [3] "HOLLYWOOD" "WEST HOLLYWOOD"
## [5] "BELL" "BEVERLY HILLS"
## [7] "CULVER CITY" "HAWTHORNE"
## [9] "HERMOSA BEACH" "MANHATTAN BEACH"
## [11] "PACIFIC PALISADES" "PALOS VERDES ESTATES"
## [13] "ROLLING HILLS ESTATES" "PALOS VERDES PENINSULA"
## [15] "RCH PALOS VRD" "REDONDO BEACH"
## [17] "TOPANGA" "VENICE"
## [19] "MARINA DEL REY" "SANTA MONICA"
However, the financial sum contributed may also only be significant, for the most part, in the aforementioned metropolitan areas.
Let’s now look at how the frequency of receipt dates of financial contributions changed over time. We’ll have five time categories:
Category 1: all dates before 2015. The first declarations of candidacy didn’t occur until March of 2015 so contributions prior to this date are likely speculative at best if we assume the mass public doesn’t possess insider knowledge.
Category 2: all dates both before February 1, 2016 and not in Category 1. The first of the primaries and caucuses occurred on February 1, 2016. One might want one category to include dates after the declarations and withdrawals of all relevant candidates but there is no earlier date by which all relevant candidates declared and/or withdrew since many ran as Republican candidates and many Republican candidates withdrew throughout the primary season
Category 3: all dates both before July 2016 and not in Category 2. The final primaries and caucuses were held in June 2016.
Category 4: all dates both before December 2016 and not in Category 3. Election day occurred on November 18, 2016.
Category 5: all dates not in the preceding categories. For reference, the electoral results were announced in January 2017 and this category includes all dates after that.
The activity of contribution amounts does show slightly significant changes in space over time. The areas immediately surrounding the metropolitan areas of the SF Bay Area, Sacremento, Los Angeles, and San Diego are always far more active than other parts of the state. As seen before, contributions are associated with other parts of the state but these are relatively trivial in quantity.
We see that when the primary season began, the number of financial contributions increased significantly, as is evidenced by the numeric values on the legend of fill gradient colors. During the primary season and up until the conclusion of the general election, not much change is apparent except the areas associated with contribution receipts become more slightly more concentrated around major metropolitan areas.
There are apparent changes in activity over time, though. If anything there appears a slight decrease in contributions over time, as perhaps supporters are fatiguing.
It might be interesting to see how support for each candidate changed over time, but the number of contributions herein doesn’t seem like a suitable proxy for that. It seems many contributions are made for purposes other than the respective 2016 campaigns.
In conducting this investigation, I was particularly fond of the below plots
This first plot shows us the proportion of contributions received by Clinton and Sanders that were of particular financial amounts.
Multiple interpretations can be made from the above. On the one hand, it seems quite apparent that a greater portion of contributions to Sanders were of smaller quantities. However, one may also argue that the difference doesn’t appear to be as significant as it is often painted to be in the media. In that sense, this plot inspires curiosity.
This next plot also inspires curiosity and brings into question the integrity of at least some contributions.
We see contributions (in this case, exclusively for Ted Cruz) made by one individual - Jan Pendergast. It seems that many contributions come in pairs, with one being for a positive amount and the other being for the negative of that amount. Further, these contributions have receipts dated all the way into the distant future and even on those dates, seem to occur with significant frequency. Such may not be definitely indicative of wrongdoing but does, at the least, arouse curiosity.
In the following, we are able to see frequency across time and space. Above, we used five individual plots instead of faceting. Here we use faciting so that we’re looking at just one plot. We’ll start by subsetting the data so we only get plots of contributions before the primary season, at the end of the election season, and after the election season so that we only view a fraction of the above graphs.
On the one hand, this visualization is satisfying to some extent because it seems to cover so many dimensions in such a way that one may comprehend the visualization quite rapidly yet still learn quite a bit. We see that although contributions are associated with disparate other parts of the state, such a significant number are associated with Sacramento, the SF Bay Area, Los Angeles, and San Diego that these other contributions are almost trivial.
Also significant is the impression that this activity has not changed after the election, which some characterize as a shocking and a game-changer of sorts. However, this data may not completely forecast future events. Only time will tell.
The first detail I’d like to discuss is the flexibility of ggmap. As I experimented with ggmap, I found that quite a number of locations could be specified - and specification was not strict, much like when using the Google Maps API geocoding feature. This should not come as a surprise since one of the used sources of maps is Google Maps. Maps can also be taken from other sources such as Open Street Maps, and formatted in different appearances, such as that of satellite imagery. The appearance of points with specified alpha levels, contour lines, and density representations was quite pleasing.
However, there were downsides such as lack of flexibility regarding plot formatting. The easiest way to remove such features as x and y-axis labels (in this case, “lat” and “lon”) and tick labels also removes user-specified later additions, and it is not trivial to completely customize the plot additions to one’s liking.
One challenge that repeatedly came up was that of data limitations. On the one hand, I recognize that this data is “clean” and thus easier to work with, such that, for instance, there are only an insignificant number of blatantly incorrect states. However, it seemed suboptimal to have a significant number of contributions with receipts dated in the future or from several years prior, yet labelled as being for the 2016 primary elections or general elections. Clearly, such could not have been known before one’s announcement of candidacy and, for all one knows, the relevant candidate may be ineligible for presidential office in the future (e.g., the candidate may serve two terms as president by the date of the donation receipt).
Further, I often encountered situations in which I desired further data. Fortunately, the quantity of records in the present data was not lacking. However, from time to time I desired further features, such as unique identification of supporters and indicators of supporter enthusiasm. To be fair, I realize I was still quite fortunate in having access to the extent of the present data.
If such data as was desired is collected in the future, such could provide grounds for future exploration. For instance, one could more reliably investigate integrity of contributions. Further, some fields, such as “memo_text” were not presently investigated because of the relative of non-empty such fields and the field “form_tp” was not presently investigated because there existed such a significant number of contributions that likely were not made for the claimed purpose and this cast doubt on the integrity of claims regarding the purpose of othercontributions. If more information regarding these features is amassed, better investigation of these could be made (e.g., one may check which “memo_text” subjects are most common and how subjects correlate with time and space - i.e., one’s locale).